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Analysis of the biological gene networks involved in a disease may lead to the identification 
of therapeutic targets. Such analysis requires exploring network properties, in particular 
the importance of individual network nodes (i.e., genes). There are many measures that 
consider the importance of nodes in a network and some may shed light on the biological 
significance and potential optimality of a gene or set of genes as therapeutic targets. 
This has been shown to be the case in cancer therapy. A dilemma exists, however, in 
finding the best therapeutic targets based on network analysis since the optimal targets 
should be nodes that are highly influential in, but not toxic to, the functioning of the entire 
network. In addition, cancer therapeutics targeting a single gene often result in relapse 
since compensatory, feedback and redundancy loops in the network may offset the activity 
associated with the targeted gene. Thus, multiple genes reflecting parallel functional 
cascades in a network should be targeted simultaneously, but require the identification of 
such targets. We propose a methodology that exploits centrality statistics characterizing 
the importance of nodes within a gene network that is constructed from the gene 
expression patterns in that network. We consider centrality measures based on both graph 
theory and spectral graph theory. We also consider the origins of a network topology, and 
show how different available representations yield different node importance results. We 
apply our techniques to tumor gene expression data and suggest that the identification of 
optimal therapeutic targets involving particular genes, pathways and sub-networks based 
on an analysis of the nodes in that network is possible and can facilitate individualized 
cancer treatments. The proposed methods also have the potential to identify candidate 
cancer therapeutic targets that are not thought to be oncogenes but nonetheless play 
important roles in the functioning of a cancer-related network or pathway. 

Keywords: network analysis, centrality, cancer, pathway, drug targets, personalized treatment, gene expression 



1. INTRODUCTION 

Treating many forms of cancer effectively is notoriously difficult 
as most tumors have complex cellular dysfunctions replete with 
compensatory and redundancy mechanisms that contribute to 
tumor growth despite some aspect of the tumor being targeted 
for destruction by an anti-cancer therapeutic agent. Thus, while 
many cancer treatments seem effective when first administered, 
relapses often occur, particularly in later stages of tumor develop- 
ment. This general "robustness" of biological networks in tumor 
cells presents true challenges for cancer treatments and cures, 
especially if treatments administered only target a single gene. 
To reduce the likelihood of resistance and the risk of relapse, it 
may be important to target multiple pathways and oncogenes 
simultaneously, but the best way to do this has not been estab- 
lished (Hughes, 2007; Petrelli and Giordano, 2008; Dar et al., 
2012). 

While many tumors have certain pathologies and dysfunc- 
tional pathways in common, the specific mechanisms contribut- 
ing to the growth of any one tumor are often distinctive and 
subtle. However, the identification of these mechanisms and 
the characterization of their contributions to individual tumor 



growth and treatment resistance can be greatly aided through 
the use of modern genomic assays and pathway analyses. Assays 
such as DNA sequencing, RNA sequencing, copy number varia- 
tion assays, and proteomic profiling can reveal phenomena such 
as damaging mutations in oncogenes, resistance gene amplifi- 
cations, and abnormal silencing of tumor suppressor genes. In 
conjunction with these assays, network and pathway analyses 
methods can reveal connections between different perturbations 
in tumors and may suggest interactions between genes that, if 
targeted simultaneously with different therapeutic compounds, 
could disrupt the network integrity of the tumor cells and lead 
to more effective interventions. 

The best way to assess connections between multiple perturba- 
tions in tumors that could be targeted simultaneously is an open 
question. However, analyses of the principal properties, behavior 
and structures associated with biological networks within tumors 
may lead to the identification of more optimal therapeutic tar- 
gets. Of the measures that one could consider in evaluating the 
properties of a tumor gene network, those focusing on network 
integrity are of particular interest. Network integrity analysis can 
lead to the identification of central gene nodes or gene hubs within 
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Network Inference 

• Q: What types of biological networks have been inferred in the paper? 

• A: We use gene expression data in conjunction with cancer-related signaling pathways to infer tumor-specific net- 
works. The extracted tumor-specific networks help us further infer critical nodes (genes) and potential therapeutic 
targets for specific types of tumors or tumor cells. 

• Q: How was the quality/utility of the inferred networks assessed? 

• A: We compare and contrast the predictions with those derived using canonical pathways. We further compare 
the predictions on various normal tissues, tumor types and tumor cells. We also assess the results using multiple 
pathway/network databases. 

• Q: How were these networks validated? 

• A: Many of the targets predicted from the networks have supporting evidences in the literatures: they are either 
implicated as oncogenes or known targets of cancer treatments. 



the network that contribute to the maintenance and growth of 
a tumor in critical ways (Jeong et al, 2001; Agoston et al., 2005; 
Perumal et al, 2009; Horvath, 2011; Li et al, 2011). For exam- 
ple, genes that are critical to the formation and growth of tumors 
have been observed to code for proteins that have increased levels 
of connectedness with other genes as well as greater centrality (i.e., 
occupying a more central place in the network rather than being 
on the periphery of the network) than genes that do not con- 
tribute to tumor growth and formation (Jonsson and Bates, 2006; 
Sun and Zhao, 2010; Xia et al., 2011). However, it has also been 
shown that most disease genes do not necessarily code for proteins 
that are hubs within a network, suggesting that some network 
characteristics may be better indicators of optimal therapeutic 
gene targets than others (Goh et al., 2007). In addition, most 
network analyses have been performed on comprehensive and 
generic interaction information rather than on networks or path- 
ways specific to individual tumors, calling into question which 
type of network topology or representation an analysis should be 
pursued with. It is noteworthy, however, that network centrali- 
zes have also been used to derive integrated gene signatures for 
breast cancer (Wang et al., 2011) and, in the context of signaling 
pathways, centrality-based analysis approaches have been used to 
identify enriched pathways from gene expression data (Gu et al., 
2012), suggesting that different data types and approaches may 
provide complementary insights. 

We assess the properties and characteristics of a cancer net- 
work topology based on gene expression data across a variety 
of tumors with subsequent analyses confined to specific types 
of tumors or tumor cells. We contrast the results of the use of 
different measures of network integrity on the ability to iden- 
tify therapeutically meaningful gene targets in cancer networks. 
Our ultimate goal was to determine if it is possible to make 
compelling claims about the existence of gene targets that might 
be optimal for therapeutic intervention based on the network 
characteristics. We rank genes (i.e., nodes in the network) and 
edges based on their influences on network function and topol- 
ogy defined by various measures, and illustrate that centrality 
analysis on signaling pathways may provide additional insights 
to that based on protein-protein interaction (PPI) networks. One 
of the potential uses of network topology analyses like those 
we pursued is to identify targets that are not necessarily known 
to be directly cancer-related but may influence tumor growth 



nonetheless. Thus, in addition to common measures of network 
centrality which focus on cancer-related genes, we also investigate 
the utility of centralities based on spectral graph theory, includ- 
ing spectral gap centrality, that consider network function in a 
broader context and that have not been explored in the context of 
biological networks to date. 

The remainder of the manuscript is organized as follows. 
Section 2 describes several centrality measures based on both 
graph theory and spectral graph theory, as well as the con- 
struction of network centralities based on gene expression data. 
Section 3 contrasts the critical nodes (i.e., genes) and edges 
defined and determined by different measures in cancer PPI sub- 
networks and pathways, pathways from different sources, and 
pathways conditioned on specific tissues and tumor cell lines. 
Section 4 summarizes the main observations and issues, and 
makes recommendations. We note that some of the terminol- 
ogy used in the literature and ways of referring to network 
components are often ambiguous. We use network and pathway 
interchangeably, although network often corresponds to the actual 
topology associated with a biological pathway. Also, when refer- 
ring to nodes in a network (pathway) we are referring to individual 
genes and their place in the topology associated with a network 
(pathway). 

2. MATERIALS AND METHODS 
2.1. CRITICAL NODES IN A NETWORK 

Network centralities are important structural attributes of a net- 
work. They can be exploited in analyses evaluating network 
robustness and reflect how much a network is connected and, 
importantly, how network functionality might be affected locally 
or globally if certain nodes or connections in the network are dis- 
rupted. There are many types of centrality measures (Freeman, 
1978/1979; Koschiitzki and Schreiber, 2008; Horvath, 2011) and 
they are often used in different contexts. In biological network or 
pathway analysis, potential drug targets are expected to be highly 
influential nodes such that perturbing these nodes will have a 
major effect on network integrity and the flow of information 
through that network. These nodes might correspond to genes 
that affect many other genes in the network, or they could be asso- 
ciated with network fragility in the sense that if they are perturbed 
the network cannot function as a whole. Such highly influential 
nodes in a network or pathway might also be toxic to the entire 
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network and lead to a complete inability of the network to func- 
tion if perturbed. Such complete dysfunction might induce more 
harm than good if it is a network that normal, non-tumor cells 
require in order to function properly. In this light, it might be bet- 
ter to target nodes or genes that influence the most critical nodes 
in a network and not the actual critical nodes themselves. Among 
the various measures of network centrality that have been pro- 
posed in the literature, we primarily focused on the four measures 
described briefly below. 

2. 1. 1. Degree centrality 

The simplest and the most common measure of node importance 
in the context of a specific network topology is degree centrality. 
Consider a network defined as a simple graph G = (V, E) with 
n = \ V\ nodes and |£| edges. The degree of node v e V is the 
number of edges incident to v. Mathematically, the graph G can 
be represented as an adjacency matrix A(G), defined as 



Aij. 



i: 



ift.je V,{i,j] 
otherwise, 



e E, 



where 1 < i, j < n. Note that in discussions of the adjacency 
matrix, we will often refer to node v, as node i and use these two 
notations interchangeably. The degree centrality of node i is then 
defined as c^i) = ^ a,j and reflects how well a node is connected 
as well as its likely direct influence on its neighbors. 

2. 1.2. Betweenness centrality 

The betweenness centrality is defined as the frequency with which 
a node is on the shortest path between two other nodes (Freeman, 
1978/1979). It reflects the likely control of communication between 
other nodes by the node in question. There are definitional and 
operational differences between two types of betweenness central- 
ity measures: node betweenness and edge betweenness. Betweenness 
for node k is defined as following, 



it is connected to Bonacich (1972). The eigenvector centrality of 
node i is 

Ce(i) = a ') C ^ 



where X is the largest eigenvalue of the adjacency matrix A. It 
reflects how well a node is connected to the well-connected nodes 
and how differences in node degrees propagate through a net- 
work. Both Google's PageRank measures and Katz centrality are 
variants of the eigenvector centrality. 

2. 1.4. Spectral gap centrality 

Another measure derived from spectral graph theory was pro- 
posed by Wehmuth and Ziviani (2011). As it is based on the 
spectral gap of sub-networks, we will refer to it as spectral gap 
centrality. The diagonal degree matrix of G, denoted D(G), is 
defined as 

dk if i = j = k. 



Di, 



If 



otherwise, 



where d^ is the degree of node k. The normalized Laplacian 
matrix (Chung, 1997) of graph G, denoted 1(G), is defined as 



0 



otherwise. 



All eigenvalues of L(G) are between 0 and 2, i.e., 0 = Xi(L) < 
~^2(L) < ■ ■ ■ < X„(L) < 2. If G is a single connected component, 
X2CL) (referred to as the spectral gap) is the smallest non-zero 
eigenvalue and is less than 1 if the graph is not complete. X2 
approaches 0 as the graph becomes less connected. The critical 
nodes are nodes with high spectral gap centrality. The spectral 
gap centrality of node i is defined as 



c b (k) = J2 



Sij 



where gij denotes the number of shortest paths between nodes i 
and j, and g,kj denotes the number of shortest paths between i, j 
through node k. Betweenness for edge e is similarly defined as, 



m 



where gi e ; denotes the number of shortest paths between nodes i, j 
through edge e. In contrast to the local effect of degree centrality, 
betweenness captures local connectivity as well as a node's global 
importance to the network. A node or edge of high betweenness 
essentially serves as a gatekeeper that could control the flow of 
information across the network. 

2.1.3. Eigenvector centrality 

The eigenvector centrality is defined as the centrality of a node 
that is proportional to the sum of the centralities of the nodes 



-1 ttt d; > 1, 

00 di = 1. 



where "k\ is the spectral gap of the /z-neighborhood of node i, i.e., 
the subgraph induced by all nodes within h edges from node i, and 
di is the degree of node i. The lower the value c^(i), the more crit- 
ical the node i is to the network. The spectral gap centrality thus 
reflects the neighborhood connectivity, and captures both degree 
and betweenness to some extent depending on the value of h. 

The four centrality measures are chosen primarily for their 
representative characteristics of networks, their direct relevance 
to potential biological functions that we are interested in, and 
the intuitive interpretation of the results. Among other mea- 
sures that might be of interest, closeness (Sabidussi, 1966) and 
radiality (Valente and Foreman, 1998) centralities reflect how 
quickly a node can reach another, which represents a different 
type of functionality. Closeness centrality requires network to 
be strongly connected which is often not the case for pathways. 
PageRank (Page et al, 1999) and Katz status index (Katz, 1953) 
are variants of eigenvector centrality. Another class of centralities 



www.frontiersin.org 



February 2014 | Volume 5 | Article 12 | 3 



Peng and Schork 



Target identification through network integrity 



are motif-based (Koschiitzki and Schreiber, 2008), which repre- 
sent functional substructures, thus are more likely employed in 
specific contexts. 

2.2. THE ORIGINS AND RELEVANCE OF NETWORK AND PATHWAY 
TOPOLOGIES 

The question of which centrality measure yields a better pre- 
diction for therapeutic targets is only one of many important 
questions associated with biological network analyses. A more 
fundamental question is which biological network to interrogate. 
Network analysis can be applied to protein-protein interaction 
(PPI) networks, often derived empirically through experimen- 
tation, or biological pathways that have been described over 
the years. The choice of a particular pathway is also compli- 
cated, since there are multiple versions and subcomponents of 
pathways to choose from. One option is to derive a protein- 
protein interaction subnetwork from the genes of relevance to a 
particular, e.g., phenotype that are grounded in a pathway. An 
alternative is to analyze the pathway topology directly without 
considering the elements associated with a protein-protein inter- 
action subnetwork. Different choices of a network or pathway 
representation — even if chosen to address the same overarching 
questions — will undoubtedly yield different results due to intrin- 
sic differences between PPI subnetwork definitions and pathways. 
In addition, the same pathway defined from different database 
sources, or compiled based on different readings or reviews of 
the literature, may also yield different results due to topological 
differences between the network representations. Further com- 
pounding these issues is the fact that all genes in a pathway are 
not equally expressed in all tissues. Thus, networks constructed 
from one set of resources or experiments may not represent the 
true network topologies associated with different tissues. For the 
identification of critical nodes and genes to be relevant to a par- 
ticular biological setting, tissue-specific network configurations 
might need to be considered. Obviously, if a gene is not expressed 
in a particular tissue of interest, for example, the node in another 
tissue-derived gene expression-based network corresponding to 
that gene and its associated edges must perforce be deleted from 
the network, thus altering the network topology. 

To evaluate the effects of different network representations 
and different network centrality measures on the identifica- 
tion of critical nodes in that network, we analyzed MAPK and 
EGFR signaling pathways and configurations obtained from dif- 
ferent sources. We treated these signaling pathway representations 
as true networks. We obtained pathway information from the 
KEGG (Kanehisa and Goto, 2000) and WikiPathways (Pico et al, 
2008) databases. We obtained a human PPI network from the 
STRING (Mering et al, 2003) database. In order to have the path- 
way representations comparable to PPI network representations, 
we treated them as undirected graphs. Note that the PPI subnet- 
work from a pathway is a subgraph of the entire PPI network 
limited to the nodes corresponding to the intersection between 
genes implicated in the pathway and those present in the PPI 
network. 

To compare and contrast tissue-specific pathways (based on 
the genes expressed in that tissue) and more generic, non- 
expression-based pathways, we analyzed cancer-related pathways 



based on expression patterns obtained from the NCI60 tumor 
cell lines (Scherf et al., 2000). To determine expression patterns 
in the NCI60 cell lines, we applied the Gene expression bar- 
code algorithm (McCall et al, 2011) to the Affymetrix gene 
expression data of each cell line, which yielded an expression 
state (i.e., expressed/unexpressed) for each gene in each cell 
line. In addition, we analyzed pathways conditioned on a set 
of gene expression states and levels obtained from normal tis- 
sues. RNA-Seq data for eleven human tissues were obtained 
from RNA-Seq Atlas (Krupp et al., 2012). A threshold on gene 
expression value RPKM (reads per kilobase of transcript per mil- 
lion mapped reads (Mortazavi et al., 2008)) was used to filter 
genes such that genes with expression levels having an RPKM< 
0.5 were considered unexpressed. Tissue or cell-specific pathway 
information was obtained by removing genes (i.e., nodes in the 
network) corresponding to unexpressed genes from the default 
pathway. 

For each pathway (represented as a network) and each net- 
work centrality measure, the nodes (i.e., genes) within them were 
ranked in two ways: (i) by their centrality values; (ii) by the order 
that they were removed based on an iterative procedure to identify 
their importance in the network. This iterative procedure worked 
by removing top-ranked nodes based on centrality value (along 
with edges incident to the node), reassessing the nodes in the 
network and repeating this process until all nodes were assessed. 

3. RESULTS 

3.1. THE EGFR AND MAPK PATHWAYS IN CANCER 

We ultimately analyzed two different pathways known to have 
pronounced roles in oncogenesis: The epidermal growth fac- 
tor receptor (EGFR) pathway and the mitogen-activated protein 
kinase (MAPK) pathway. Both EGFR and MAPK signaling path- 
ways are well-studied and comprehensively curated, making them 
ideal for our comparison of various methods for assessing node 
importance and therapeutic target potential. We briefly describe 
each below. 

EGFR, also called ErbBl, is a member of the ErbB family 
of receptor tyrosine kinases. The EGFR pathway is one of the 
most important pathways regulating cell growth, differentiation 
and survival (Holbro and Hynes, 2004). Abnormally high lev- 
els of the EGFR protein are frequently found on the surface of 
many types of cancer cells, facilitating the excessive cell division 
that is the hallmark of cancer. The defective regulation of the 
EGFR signal transduction pathway is also known to be associated 
with oncogenesis. EFGR and its signaling components therefore 
offer promising therapeutic targets for various cancers (Citri and 
Yarden, 2006; Scaltriti and Baselga, 2006). 

The MAPK superfamily includes well-conserved kinase genes 
known to be involved in various cellular functions including cell 
growth, proliferation, differentiation, migration and apoptosis. 
They are regulated by four distinct groups of genes in mammals: 
ERK1/2, JNK, p38 and ERK5. While ERK1/2 and ERK5 pathways 
are relatively insulated, JNK and p38 kineses share many of their 
activators, thus the two cascades are more entangled (Chen et al., 
2001; Yang et al., 2003). It has been well-established that aberra- 
tions in MAPK signaling play critical roles in cancer development 
and progression (Dhillon et al., 2007). 
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3.2. PPI SUB-NETWORK VERSUS SIGNALING PATHWAY ANALYSES 

The EGFR network we derived from WikiPathways [EGFR sig- 
naling pathways (Pico et al., 2008; Kandasamy et al., 2010)] has 
235 nodes and 249 edges. The average node degree is 1.06, and 
the graph density (i.e., the fraction of possible edges) is 0.01. The 
PPI subnetwork induced by EGFR pathway has 119 nodes and 
4638 edges. Its average node degree is 39, and the graph den- 
sity is 0.66. We applied the different centrality measures discussed 
above to each network and ranked the nodes (genes) on the basis 
of these measures. Tables 1, 2 list the top-ranked genes (ranked 
between 1 and 10 for at least one measure) obtained from the 
PPI subnetwork and the EGFR pathway, respectively. The left five 
columns reflect degree centrality, node betweenness, eigenvector 
centrality, spectral gap centrality with h = 2 and spectral gap cen- 
trality with h = 3; the right five columns provide corresponding 
measures with top-ranked nodes removed and the remaining sub- 
graphs re-evaluated iteratively. The upper triangular matrix at the 
bottom half of the table gives Spearman's rank correlation coeffi- 
cients assessing the relationship between the results of each pair 
of metrics. This is computed using the actual rankings of all genes 
listed in the table, including those ranked beyond ten. 

As shown by Spearman's p in Table 1, the rankings from differ- 
ent metrics are highly correlated among genes in the PPI subnet- 
work. This can be explained by the properties of the network. The 
PPI network, like many biological networks (Lima-Mendez and 
van Helden, 2009), has the following properties: (i) High-degree 
nodes tend to be connected with other high-degree nodes; (ii) 



The network diameter (i.e., the length of the longest of the short- 
est paths between any two nodes) is usually small. A subnetwork 
shares these properties if it is induced on nodes of high degrees. 
Genes corresponding to high-degree nodes in a PPI network usu- 
ally have systemwide effects and are involved in multiple pathways 
including cancer-related pathways (Han et al, 2004; Barabasi 
et al., 2011). Spectral gap centralities in such a subnetwork are 
largely dominated by node degrees, and eigenvector and between- 
ness centralities also track the degrees for the high-degree nodes. 
Consequently, different centrality metrics on a pathway-induced 
PPI subnetwork are unlikely to yield significant insights beyond 
what is already coded in node degrees. As noted, although high- 
degree nodes in a PPI network may serve as effective drug targets, 
they are also likely to be toxic if perturbed in severe ways, due 
to their system-wide influence, i.e., their likely being involved in 
many cellular functions as they influence many pathways simulta- 
neously. In this light, Wang et al. (2013) showed that the number 
of side effects of a drug is positively correlated with the degree 
and betweenness centralities of that drug's targets in the protein- 
protein interaction network. This observation was found to be the 
case for both cancer and non-cancer drugs. 

In contrast to an analysis of the PPI network, the different cen- 
trality measures produced different node rankings when applied 
to the pathway information, as described in Table 2. Thus, while 
some nodes ranked high in multiple metrics indicating their 
overall importance, there are groups of nodes that rank high 
based on one or another measure, especially with respect to the 



Table 1 | Top ranking genes in EGFR PPI subnetwork by various centrality measures. 
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Cfj, degree centrality; c b , node betweenness; c B , eigenvector centrality; cf, cf, spectral gap centrality h = 2, 3. c^ d b g s| ,Cs 2,3 ' r , node ranking are obtained by 
consecutively removing the top ranked nodes. The bottom matrix is Spearman's rank correlation coefficients. 
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Table 2 | Top ranking genes in EGFR pathway by various centrality measures. 
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betweenness and eigenvector centrality measures. High ranking 
nodes based on the betweenness and eigenvector centrality mea- 
sures appear to be exclusive to each other in the pathways we have 
analyzed. The spectral gap centrality measure tends to capture a 
few nodes ranked high by each of the other three metrics. Similar 
results were observed when we analyzed the MAPK pathways, as 
described in the next section. 

We note that genes (nodes) ranked high exclusively by the 
eigenvector centrality measure (i.e., STAT1, JAK2, JAK1, PIAS3, 
COX2, GRIM19) are all neighbors (directly downstream or 
upstream in the pathway) of STAT3, which plays a leading role 
in cancer inflammation and immunity, and is a validated target 



for cancer therapy (Yu et al, 2009). JAK-STAT signaling is a well 
understood cascade as its aberrant activation has been implicated 
in various types of leukemias, as well as solid tumors (Ferrajoli 
et al., 2006; Sansone and Bromberg, 2012). In addition, it has 
been established that STAT1 overexpression is associated with 
anticancer drug resistance (Khodarev et al., 2012). Interestingly, 
the FDA-approved drug ruxolitinib is a JAK1 and JAK2 inhibitor, 
and more JAK inhibitors are in development (Verstovsek et al., 
2012). Also, PIAS3 overexpression has been shown to inhibit cell 
growth and increase drug sensitivity in lung cancer (Ogata et al., 
2006), and several studies have indicated that COX2 inhibitors 
(NSAIDs and celecoxib) have protective effects against colorectal 
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cancers and breast cancers (Gupta and DuBois, 2001; Arun and 
Goss, 2004; Brown and DuBois, 2005). Finally, Okamoto et al. 
(2010) demonstrated that overexpression of GRIM19 in cancer 
cells suppresses STAT3 -mediated cancer growth. 

As emphasized, the analysis of singular nodes that may be logi- 
cal drug targets in a network is tremendously important in cancer 
therapeutic development. However, targeting multiple signaling 
pathways simultaneously is an essential strategy in managing can- 
cer and reducing the possibility of an individual tumor developing 
drug resistance. It is therefore important to identify critical genes 
in multiple cascades within a network. By removing top-ranked 
nodes that appear to be the most critical for drug response and 
then re-evaluating the remaining subnetworks, additional critical 
nodes that may act as redundancy and compensatory mechanisms 
and contribute to drug resistance can be identified. Interestingly, 
when the betweenness and spectral gap centralities are applied to 
a network in such fashion, the first few critical nodes often reside 
on different paths (cascades) in that network. This phenomenon 
is not as pronounced for the node degree and eigenvector cen- 
trality measures, as their values are affected primarily by a node's 
nearest neighbours in the network and the properties that these 
neighboring nodes have. For example, consider c r s as applied to 
the EGFR pathway (Table 2): for h = 2, the top three nodes are 
EGFR, SRC, and MAPK1 (ERK), belonging to two paths; for 
h = 3, the top nodes are GRB2, STAT3 and MAP2K1 (MEK), 
also on two cascades, the classical MAPK and Jak-STAT cascades. 
We observed similar effects in the analysis of the MAPK pathway 
as detailed in the next section. In this light, nodes ranked high 
by c r 5 alone, such as SH3KBP1, PLCG1, JUN, JUND, might also 
serve as potential therapeutic targets. SH3KBP1 has been impli- 
cated in cell death and shown to mediate down regulation of 
EGFR (Soubeyran et al, 2002; Feng et al, 2011), and JUND has 
been shown to reduce tumor angiogenesis (Gerald et al, 2004). 
We consider the betweenness centrality measure in a separate 
section. 

3.3. DIFFERENT REPRESENTATIONS OF THE SAME NETWORK 

There are often multiple sources for the same biological network 
or pathway. Variations in the topology of a network associated 
with different representations of that network can be attributed 
to, among other things: what genes or proteins (nodes) are 
included in the network; what types of interactions are included 
(e.g., gene-protein, protein-protein, interactions derived from 
correlations in expressions values of genes) and how they are rep- 
resented as edges in the network; and how protein complexes 
are represented. The MAPK pathway can be used to illustrate 
this. The MAPK pathway from KEGG (Kanehisa and Goto, 
2000) has 129 unique nodes and 161 edges (average node degree 
= 1.25; graph density = 0.02), while the same pathway from 
WikiPathways (Pico et al., 2008) is made of 186 nodes and 168 
edges (average node degree = 0.90; graph density = 0.01). Note 
that a pathway is not always represented as one connected com- 
ponent. A main difference between the KEGG and WikiPathways 
representations is that protein-gene complexes are shown as sin- 
gle nodes in the former, while various components of the complex 
appear individually in the latter, and additional nodes, referred to 
as compound nodes subsequently, are used to group the complex 



together. Although the different representations of the MAPK 
pathway have biological appeal, since they exploit and incorporate 
different data types and ways of integrating them, the resulting 
topologies are quite different and obviously affect the ability to 
identify critical nodes in that pathway. For instance, nodes con- 
necting a complex (i.e., compound nodes) in WikiPathways often 
have high degrees. Consequently, the significance of the individ- 
ual nodes in the complex, as well as other nodes, will be affected 
in the identification of critical nodes. Pathways involving differ- 
ent data sources may be represented as compound graphs for 
perhaps a clearer layout and to facilitate more modularized mod- 
eling (Dogrusoz et al., 2005). However, it is unclear how best to 
treat differences between pathway representations in a network 
analysis, especially with respect to what makes the most biolog- 
ical sense, as well as how to interpret the different results. We 
considered analyses involving both the KEGG and WikiPathways 
representations to highlight differences that may result from 
their use. 

Tables 3, 4 list critical nodes identified from the MAPK path- 
way as derived from the KEGG and WikiPathways representa- 
tions. While some nodes ranked high in one pathway but not the 
other, most top-ranked nodes are shared. Their rankings, how- 
ever, are rather different. Of the compound nodes in the MAPK 
pathway from WikiPathways, CASP*, PPP3* and PRKC* rank 
high essentially because they are each connected to multiple indi- 
vidual genes (7, 5, 5 genes, respectively) of a complex, thus having 
relatively high degrees. While compound nodes highly affect the 
degree centrality ranking, other measures, especially the spectral 
gap centrality measures, are less affected (unless average node- 
degree is high, as shown in the analysis of the PPI subnetworks), 
making them more informative and reliable. In addition, the 
spectral gap centrality measure, when applied with a higher h, 
captures nodes with more global rather than local importance. 
For instance, the top three nodes by c r s (h = 3) in the KEGG 
MAPK pathway representation are RAF1, ASK1 and MEKK1, 
which are on the ERK1/2, p38, and JNK cascades respectively. 
Similarly the top three nodes in WikiPathways MAPK pathway 
representation are ERK, MEKK1 and MKK7, which are on the 
ERK1/2 and JNK cascades. As shown in the previous section, 
nodes captured by eigenvector centrality are especially interest- 
ing, particularly if they are not captured by other measures, since 
they are often connected to otherwise critical nodes, thus suggest- 
ing that these nodes have the potential of being a direct influence 
on the behavior of the network. For instance, the MKP (from 
the DUSPs gene family) and PTP genes are ranked high by c e 
alone and ranked 2 and 3 based on an analysis of the KEGG 
MAPK pathway representation, and as the top two c e nodes in 
the WikiPathways representation of the MAPK pathway as well. 
These genes are known to be inhibitors of ERK, JNK and p38, 
thus covering three out of four potential cascades or crucial 
subcomponents of the MAPK pathway. Indeed, PTP genes have 
emerged as drug targets for cancer (Jiang and Zhang, 2008), and 
MKP-DUSP genes have been found to be involved in cancer pro- 
gression and resistance, and have thus also become potential drug 
targets (Bermudez et al., 2010). 

The PPI subnetworks associated with the MAPK pathway, 
based on both KEGG and WikiPathways representations, share 
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Table 3 | Top ranking genes in MAPK pathway (KEGG) by various centrality measures. 
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similar properties with those from EGFR pathways: (i) the aver- 
age node degrees are high; (ii) the graph densities are roughly 
2/3; (iii) the node degrees dominate the critical node rankings 
of various metrics. As a result, the top-ranked nodes are the usual 
suspects, such as AKT1, P53, and RAF1. And for brevity's sake, we 
do not provide detailed descriptions of the results of this analysis. 

3.4. THE IMPORTANT ROLE OF BETWEENNESS CENTRALITY IN 
NETWORK ANALYSES 

Since attacking multiple networks and pathways therapeutically 
in cancer is appropriate and necessary, it is imperative to find 
the critical signaling and major parallel cascade subnetworks. 
Betweenness centralities have the potential to reveal gatekeeping 
nodes or edges that control the flow of signal transduction along 
the cascades. In addition to node betweenness, edge between- 
ness may reveal targets that confer distinct functional advantages. 
It is noteworthy that although many genes/nodes in a network 
can be linked to multiple functions, it may be the case that 
only one of such links is disease-related (Zhong et al., 2009). 
Thus, blocking or perturbing a node with multiple functions may 
have unanticipated effects. The edge betweenness measure may 
offer more information than node importance in this regard. 
In addition, it could identify edges connecting major cascades 
involved in multiple functions. Since it is known that some 
cancer-related genes and proteins are difficult to target with 
small molecules, for example the p53 gene, drugs targeting an 
edge/interaction for which such genes are connected may offer 
ways of indirectly targeting and influencing those genes (Arkin 
and Wells, 2004). 



Our analyses involving betweenness centrality with consecu- 
tive removal of top-ranked nodes or edges is more revealing. For 
instance, in Table 2, the following nodes with high betweenness, 
GRB2, SOS1, HRAS, RAF1, MAP2K1, MAP2K2, and MAPK1 are 
all on the same path. If the top-ranked node is removed and 
betweenness is re-evaluated on the remaining network, we imme- 
diately recognize the critical importance of nodes GRB2 and SRC, 
which are involved in multiple signaling paths in the network. 
Similarly, for analyses involving the edge betweenness centrality 
for the EGFR pathway, while four out of the top five edges by e b 
are on the same path, the top five edges by e r b are on four distinct 
paths (Table 5). 

This phenomenon of nodes and edges gaining or losing impor- 
tance depending on the measure used is even more pronounced 
in the analysis of the MAPK pathways (Tables 3, 6). The MAPK 
pathways include four cascades: classical MAPK pathway (also 
known as ERK1/2 pathway), JNK and p38 MAPK pathway, and 
ERK5 pathway. The top three nodes by c[, MEK2, JNK, ASK1, are 
on three of these cascades (Table 3). Table 6 suggests that while 
four of the top five edges by e b are on the same ERK 1/2 cascade, 
the top three edges by e r b are each on one cascade: Rafl— MEK2 
on ERK1/2, ASK1-MKK2 on p38, MKK-JNK on JNK, while the 
fourth edge MEKK1— MEK2 connects the JNK and ERK1/2 paths 
(see Xu et al., 1995 for details on this link). Thus, these nodes and 
edges do indeed essentially capture the main paths in the MAPK 
network. Note that the ERK5 cascade is presented as a separate 
component and the subgraph is a linear graph. Consequently, 
none of its nodes or edges ranked high in this particular 
analysis. 
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Table 4 | Top ranking genes in MAPK pathway (WikiPathways) by various centrality measures. 
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Nodes that do not correspond to genes directly are omitted. 



Table 5 | High betweenness edges in EGFR pathway. 
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Classical MAPK 


GRB2-SOS1 


Classical MAPK 
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Classical MAPK 
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SRC-GAB2 


SRC/GAB2/PI3K/AKT (Phagocytosis) 
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HRAS-RAF1 


Classical MAPK 


RAF1-MAP2K1 


Classical MAPK 
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RAF1-MAP2K1 


Classical MAPK 


ASAP1-ARF6 


PAG3/ARF6 (Phagocytosis) 



3.5. PATHWAY ANALYSES CONDITIONED ON EXPRESSED GENES IN 
TISSUES AND TUMOR CELLS 

Not all genes are expressed in all tissues and cells. In tumor cells, 
certain genes are amplified, others silenced, often abnormally so. 
Not only do tumor cells differ from normal cells in this regard, but 
they also differ from each other. As such, the same pathway man- 
ifests differently in different cell types: if a gene is unexpressed, 
the encoded protein should be considered non-functional, and 
should be factually deleted from the pathway for an analysis. 
While analyzing the default pathway topology yields invaluable 
insights, tissue or cell-specific pathway topology needs be consid- 
ered for network analysis to be more relevant. The best way to 
construct appropriate networks for cell or tissue-specific analyses 
is an open question, but might be achieved best by constructing 



them de novo from relevant experimental data (Ranola et al., 
2013). 

3.5. 1. EGFR pathway restricted by gene expression levels in the 
NCI60 cell lines 

There are sixty unique cell lines of nine tumor types in 
NCI60 database. We applied the gene expression barcode algo- 
rithm (McCall et al., 2011) to the microarray gene expression 
data of NCI60 cell lines to filter out unexpressed genes. The 
gene expression barcode is essentially a normalization method 
leveraging microarray data in the public domain to answer the 
question: "given an individual microarray experiment of a cell 
type, is a gene expressed or unexpressed in that cell?" Unexpressed 
genes are deleted from the default pathway. For each NCI60 cell 
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Table 6 | High betweenness edges in MAPK pathway (KEGG) . 
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line, between 40% and 60% of the 235 nodes in the default net- 
work made from EGFR pathway were removed after this simple 
analysis. We then evaluated the importance of nodes or edges in 
each individual topology. 

Figure 1A shows gene rankings for the spectral gap central- 
ity measure c r 5 (h = 2) averaged over cell lines for each tumor 
type, where the top row labeled as def provides the rankings in 
the default pathway. While all tumor types are different from 
each other, the patterns of genes expressed and unexpressed in 
them suggest that a network derived from these genes would be 
very different from the default pathway. Essentially, each indi- 
vidual cell line presents unique gene expression patterns as well. 
This diversity requires pathway analyses specific for each individ- 
ual tumor. Figures 1B,C show the gene rankings for individual 
melanoma and breast cancer cell lines respectively. Figures 2, 3 
show top-ranked nodes by eigenvector centrality and top-ranked 
edges by betweenness for each tumor type as well as the individual 
melanoma and breast cancer cell lines. 

Melanoma cell lines can be clustered into three groups by 
c r 5 (h = 2) top-ranked genes, one with EGFR and RAF1 ranked 
high exclusively, the other with MAPK1 at top rank, and the third 
with a mixture of EGFR/RAF 1 /MAPK 1 (Figure IB). Eigenvector 
centrality c e clusters melanoma cell lines quite differently from 
c r s (h = 2) (Figure 2B). Among breast cancer cell lines, MCF7 
is unique when the measure c r s (h = 2) is used to assess the 
EGFR pathway with GRB2 and MAPK1 ranking highest, while 
all the other cell lines have EGFR and/or RAF1 as top rank- 
ing genes (Figure 1C). T47D appears unique by both c e and e/, 
(Figures 2C, 3C). Eigenvector centrality c e yields unique sets of 
genes for each cell line except for genes STAT1/3 and CBL, each 
shared by two cell lines as the top candidates (Figure 2C). CBL 
protein family has been implicated in a number of human can- 
cers and indeed shown to enhance breast tumor formation by 
inhibiting tumor suppressive activity of TGF-P signaling (Kang 
et al., 2012). The application of the edge betweenness measure 
again clusters melanoma and breast cancer cell lines into two to 
three groups, but in different ways than those derived with other 
metrics (Figures 3B,C). 

3.5.2. Analysis of the EGFR pathway restricted to eleven normal 
tissues 

The RNA-Seq Atlas (Krupp et al., 2012) has RNA-Seq data for 
eleven normal human tissues. Hebenstreit et al. (2011) suggests 
that there are two major classes of gene expression levels in most 
cells: lowly expressed, which are likely non-functional, and highly 
expressed, which are likely to be biologically meaningful. The dis- 
tribution of Zog2(RPKM) gene expression values across the eleven 



human tissues is indeed bimodal, suggesting these two major 
classes. Determining a simple threshold for defining unexpressed 
genes, however, is still somewhat arbitrary. We considered a 0.5 
RPKM value as a threshold for differentiating unexpressed vs. 
expressed gene, which is not only often suggested as a conservative 
threshold, but also seems reasonable in this dataset. Figures 4A,C 
show the top ranked genes by spectral gap centrality c r s (h = 2) 
and top edges by the betweenness measure e r b for each tissue. With 
the exception of liver, and to a lesser extent skeletal muscle, the 
rankings of the most critical genes in the other nine tissues are 
quite similar to those from the default pathway, and even more 
similar to each other. The critical edges in tissues differ from those 
from the default pathway, but they are very similar to each other 
with the exception of those of skeletal muscle. Even though the 
data set cannot be compared directly to the NCI tumor cell lines 
for purely technical reasons, the general patterns of node impor- 
tance are markedly different (see Figures 1, 3). With the use of 
a threshold of 0.5 RPKM, around a quarter nodes are deleted 
from the default EGFR pathway. To make the number of nodes 
more comparable to the tumor cell lines, we set a more aggres- 
sive threshold of 3 RPMK so that between 40% and 60% nodes 
are filtered out. Figure 4B shows the result for c r s (h = 2) (edge 
betweenness e r h is omitted due to space limitation). Even though 
there are considerable differences and variations, they are still 
less varied than the tumor cell types (Figures 1A, 2A, 3A). We 
note that expression patterns of a tissue could be the averaged 
expressions over different cell types within the tissue. 

3.5.3. Integrated breast cancer pathway restricted by NCI60 breast 
tumor cells 

The Integrated Breast Cancer pathway incorporates the most 
important proteins for breast cancer. It has 190 unique nodes 
and 348 edges (mean node degree = 1.83; graph density = 0.02). 
Figure 5 shows the top ranking nodes and edges by different 
measures for each NCI60 breast cancer cell line. 

While BRCA1 ranked highest by c d (not shown), c r s (h = 2, 3) 
(h = 3 not shown) for cell lines MCF7, MDA_MB_231 and 
BT_549, MAX (Myc associated factor X) ranked highest by q and 
c r s (h = 3) for HS578T and T47D. It is known that MYC deregula- 
tion contributes to breast cancer development and progression. 
Loss of BRCA1 coupled with MYC overexpression leads to the 
development of breast cancer (Xu et al, 2010) and recent evidence 
has shown that MYC is druggable (Pourdehnad et al., 2013). 

Smad2 ranked high by at least one measure for each cell 
line. Smad genes are highly ranked by c e for all but MCF7 and 
BT_549, for which STAT1 and AR emerged more important 
(in addition to BRCA1). Although Smad2/3/4 signaling plays a 
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FIGURE 1 | Top ranked genes by spectral gap centrality with node 
removal c r s (h = 2) of EGFR pathway conditioned on NCI60 cell line gene 
expression. Ranks range from 1 (dark red) to 10 (blue), and > 10 (the darkest 
blue). (A) Rankings are averaged for each tumor type: BR, breast; CNS, 



tumor suppressor role, it also exhibits a pro-metastatic func- 
tion in breast cancer (Kang et al., 2005). It is also believed that 
Smad-dependent pathway is involved in TGF-F5 tumor suppres- 
sor functions. Various TGF-f5 inhibitors are in development and 
preclinical studies have shown their promises in cancer treat- 
ments (Nagaraj and Datta, 2010). 

Evidence has correlated up-regulation of STAT1 activity with 
increased breast tumor progression and immune suppression in 
tumor microenvironment, thus STAT1 inhibition is a promising 
immune therapeutic target (Hix et al, 2013). Androgen receptor 
(AR) is commonly expressed in breast cancers. It ranked high by 
c e for cell lines MCF7 and BT-549. There is a history of target- 
ing AR for therapy in breast cancer, although the efficacy of AR 
targeted treatments is moderate (Garay and Park, 2012) probably 
due to a lack of clear understanding of the AR signaling mecha- 
nism. For MCF7 cell line though, inhibitory effects of androgens 
targeting AR have indeed been shown in multiple studies (Greeve 
et al., 2004; Macedo et al., 2006). 



central nervous system; CO, colon; LC, non-small cell lung; LE, leukemia; 
ME, melanoma; OV, ovarian; PR, prostate; RE, renal, def, default pathway 
with all nodes. (B) Gene ranking by c r s (h = 2) for NCI60 melanoma cell lines. 
(C) Gene ranking cj.(/i = 2) for NCI60 breast cancer cell lines. 



Notice that since the analysis of the breast cancer pathway is 
conditioned on the gene expression patterns in each cell line, 
major tumor suppressor genes such as P53 and BRCA2 are 
deleted. The exome data of NCI60 (Abaan et al, 2013) cell lines 
shows that each of the five breast cancer cell lines has between 
one to four missense or silencing TP53 mutations, and two to five 
missense or silencing mutations in BRCA2. Only MDA_MB_231 
has a silencing BRCA1 mutation. If we analyze the default breast 
cancer pathway instead of the pathways built only from genes 
expressed in the cell lines, the top three gene nodes are P53, 
AKT1 and BRCA1 based on the q or c e measures, or CERK1, 
SMAD2 and AKT1 by the c r s (h = 2) measure, respectively. The 
top two ranked edges based on the betweenness measure (with 
edge removal) are the TGFR1-SMAD4 and P53-C9JNK1 edges. 

4. DISCUSSION 

The identification of genes that are optimal or logical therapeu- 
tic targets in tumors based on genomic information is crucial for 
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FIGURE 2 | Top ranked genes by eigenvector centrality c e of EGFR 
pathway conditioned on NCI60 cell line gene expression. Ranks 
range from 1 (dark red) to 10 (blue), and > 10 (the darkest blue). 



(A) Gene ranking by c e for tumor types. (B) Gene ranking c e for 
NCI60 melanoma cell lines. (C) Gene ranking c a for NCI60 breast 
cancer cell lines. 



individualizing cancer treatments. We explored the utility of net- 
work centrality analysis of standard pathways and pathways based 
on gene expression information in identifying potential thera- 
peutic targets for a tumor. We also described the complexity of, 
and issues associated with, such analysis. We considered ranking 
genes in a network or pathway either by their centrality values or 
by iteratively recording the top-ranked node and reevaluating the 
remaining subnetwork with the highest ranked node removed. 
When analyses are performed on PPI subnetwork created from 
genes associated with a specific pathway, the top ranked genes 
based on different node importance measures are highly posi- 
tively correlated. We observed a similar phenomenon when PPI 
subnetworks derived from genes that have been implicated in par- 
ticular types of cancers were assessed, both when using the genes 
in these PPI subnetworks alone and by expanding these subnet- 
works by including nodes one or two edges from the seed genes 
used to create the PPI subnetwork (data not shown). The high- 
degree nodes in a PPI network are critical to the functioning of 
that network, and thus are likely to be important drug targets. 
However, such nodes are not likely to be specific to a particular 
pathway and as such targeting them therapeutically could also be 
potentially toxic to a patient. 



When applied to a signaling pathway, various measures of cen- 
trality yield different sets of important genes and the rankings 
of these genes across different node importance measures are 
much less correlated. This lack of correlation among node impor- 
tance measures may provide more insight into the functioning 
of a network or pathway since the different measures may be 
capturing different aspects of information flow through the net- 
work. However, a possible confounding factor in the analysis of 
node importance in networks is that the same pathway may be 
represented in different ways across different databases, leading 
to different network topologies. It is unclear how to determine 
which topology is the best representation of a pathway in such 
cases. 

In the context of different measures of node importance, 
eigenvector centrality has the potential to reveal nodes that may 
impact other highly influential nodes (for instance nodes of high 
degree). These other nodes may reflect genes that could serve as 
alternative therapeutic targets when the highest ranked nodes or 
genes are hard to target or possibly be toxic to the system as a 
whole if targeted therapeutically directly. Identifying these alter- 
native important nodes using eigenvector centrality should be 
done on the pathway without iteratively deleting nodes or those 
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FIGURE 3 | Top ranked edges by betweenness with edge removal e' b 
of EGFR pathway conditioned on NCI60 cell line gene expression. 

Ranks range from 1 (dark red) to 5 (blue), and > 5 (the darkest blue). 



(A) Edge ranking by e r b for tumor types. (B) Edge ranking by e r b for 
NCI60 melanoma cell lines. (C) Edge ranking by e' b for NCI60 breast 
cancer cell lines. 



alternative nodes are not likely to be discovered. For instance, 
while SRC, STAT3, EGFR and GRB2 ranked the highest in the 
EGFR pathway by two or more measures, STAT1, JAK1/2, PIAS3, 
COX2 and GRIM19, all being neighbors of STAT3, ranked within 
top ten exclusively by the eigenvector centrality. Each of these 
genes has been implicated in some type of cancers and some are 
known targets of cancer treatments. As mentioned in Section 3.2, 
Ruxolitinib, an FDA-approved drug for treatment of a type of 
bone marrow cancer, is a JAK1/2 inhibitor (Mesa, 2010). NSAIDs 
and Celecoxib are COX2 inhibitors and have protective effects 
against colorectal and breast cancers (Gupta and DuBois, 2001; 
Aran and Goss, 2004; Brown and DuBois, 2005). In addition, 
Hide et al. (2011) showed that the combination of a PTGS2 
(COX2) inhibitor and an EGFR inhibitor prevented tumorgenesis 
of oligodendrocyte lineage-derived glioma-initiating cells. Finally, 
Li et al. (2013) demonstrated that micro RNA-26b might act as a 
tumor suppressor in breast cancer by targeting PTGS2. 

Nodes ranked high by the betweenness measure with iterative 
node removal are often on parallel cascades in the pathway, which 
are important for simultaneously targeting multiple pathways in 
cancer treatment. The top three nodes identified in this fash- 
ion in MAPK pathways, for example, are MEK2, JNK and ASK1, 
which reside on ERK1/2, JNK and p38 cascades respectively. Edge 
betweenness generates potential edge-specific, or edgetic targets, 
which are more specific to a particular pathway and the nodes 
implicated in these edges might provide an alternative for ther- 
apeutic targeting if the highest ranked individual nodes are hard 
to target. Similarly, edges identified as important by iterative edge 
removal tend to reside on separate paths. 



Although high degree nodes are very important to the func- 
tioning of a network, they are also more prone to differ if local 
changes in a network topology are made. The spectral gap cen- 
trality measure, on the other hand, is less sensitive to local degree 
changes, and is more reliable if slightly different network topolo- 
gies are considered. The spectral gap centrality measure also 
captures both degree and betweenness phenomena simultane- 
ously, thus complementing betweenness measures when used in 
isolation in an important way. This is particularly true in the con- 
text of signaling pathways where the betweenness measures tend 
to capture fragile nodes and edges. The choice of the parameter h 
in the spectral gap centrality measure calculation is more compli- 
cated and is likely best approached empirically. Smaller values of 
h tend to capture local node importance while larger values of h 
tend to capture more global node importance. For typical path- 
ways and PPI networks, setting h = 2 or 3 is a reasonable choice. 
The spectral gap centrality measure node rankings are also more 
informative when computed with iterative node removal. 

Ultimately, in the context of finding potential therapeutic tar- 
gets for tumors, we firmly believe that network analysis should 
consider cell or tissue specific pathways and networks and not 
rely on generalized or tissue independent canonical pathways and 
networks. In order to assess tissue-specific networks and path- 
ways, we considered the use of the expression levels of genes 
in tissues to filter out unexpressed genes. We did this by using 
either gene expression barcodes based on available array data or a 
RPKM threshold based on RNA-Seq data. When different mea- 
sures of node importance are applied to tissue or cell-specific 
pathways obtained in this way, the resulting top-ranked genes 
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FIGURE 4 1 Top ranked nodes of EGFR pathway conditioned on 
eleven normal human tissues from RNA-Seq Atlas. Ranks range 
from 1 (dark red) to 10 (blue), and > 10 (the darkest blue) 
for (A,B); from 1 (dark red) to 5 (blue), and > 5 (the darkest 



blue) for (C). (A) Node ranking by spectral gap centrality c r s (h = 2) 
with RPKM > 0.5. (B) Node ranking by spectral gap centrality 
c r s (h = 2) with RPKM > 3.0. (C) Edge ranking by betweenness 
with RPKM > 0.5 



varied significantly among different cell types. We found that 
variations in node importance between different tumor types are 
generally larger than those variations between different normal 
tissues. This is to be expected given the complex rearrangements 
and perturbations in tumors. For a particular tumor type, anal- 
ysis of different tumor cell lines or subtypes results in different 
nodes deemed crucial or important to a particular pathway. For 
instance, when the integrated breast cancer pathway is restricted 
by the five NCI60 breast tumor cell lines based on their respective 
gene expressions, BRCA1 ranked highest by degree and spectral 
gap centralities for cell lines MCF7, MDA_MB_231 and BT_549, 



while MAX ranked highest by the same measures for cell lines 
HS578T and T47D. SMAD2 ranked high by at least one central- 
ity measure for each of the five cell lines. While SMAD genes 
were highly ranked by eigenvector centrality for MDA_MB_231, 
HS578T and T47D, STAT1 and AR appeared more important for 
MCF7 and BT_549. 

We recognize that there are limitations and caveats in our 
analyses. As more and more RNA sequencing studies are being 
pursued on tumors, a simple threshold used to differentiate 
expressed and unexpressed genes in these tumors will be harder 
to define. Thus, better methods need be explored to determine 
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FIGURE 5 | Top ranked nodes and edges of integrated breast cancer 
pathway conditioned on NCI60 breast cancer cell lines. Ranks range from 
1 (dark red) to 10 (blue), and > 10 (the darkest blue) for (A,B); from 1 (dark 



red) to 5 (blue), and > 5 (the darkest blue) for (C). (A) Node ranking by 
spectral gap centrality c' s (h = 2). (B) Node ranking by eigenvector centrality 
c e . (C) Edge ranking by betweenness e r b . 



which genes might need to be filtered out or included in a path- 
way analysis. While capturing important relevant oncogenes or 
genes impacting oncogenes in a pathway, filtering genes based on 
whether they are expressed or unexpressed in a cell type naturally 
filters out abnormally silenced genes, thus potentially excluding 
malfunctioned tumor suppressor genes in analysis, such as the 
p53 gene. This can be salvaged by analyzing the default path- 
way to some extent. In this light, given the extremely complex 
nature of cancers, finding critical genes in specific pathways is 
just a tiny piece of a puzzle to determine how best to treat can- 
cers. Not only will an analysis of critical nodes in a network 
need be approached with caution, but it should also be used 
in conjunction with other information, such as the analysis of 
DNA sequence mutations, copy number variations and other 
bio-markers. In addition, treating gene expression as a binary 
factor to construct a network's topology for use in an analysis 
of node importance is admittedly a simplistic approach. Rather, 
expression levels and rates of gene amplifications can also be 
incorporated into network analysis. Also, in addition to analyzing 
tumor cells alone, it will likely be more informative to com- 
pare normal and tumor samples to better quantify tumor-specific 
genomic perturbations. Ultimately, we believe our analyses shed 



light on the utility of measures of node and edge importance in 
an analysis of gene networks and pathways in tumor biology and 
cancer treatment choice and hope that they may motivate further 
research in this area. 
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